- Data import
- Principles of R
- Data tidying
- Join tables
07 November 2018
read_csv takes the first 1000 lines of a table to guess the column type.read_csv("a\na")
## # A tibble: 1 x 1 ## a ## <chr> ## 1 a
read_csv takes the first 1000 lines of a table to guess the column type.read_csv("a\n1")
## # A tibble: 1 x 1 ## a ## <int> ## 1 1
read_csv takes the first 1000 lines of a table to guess the column type.read_csv("a\n2010-01-01")
## # A tibble: 1 x 1 ## a ## <date> ## 1 2010-01-01
Problems occur if:
havendf <-
file %>%
read_delim(delim = ",", na = c("", "NA"),
skip = 0, ...)
becomes …
df <- file %>% read_csv()
by:
read_delim has the same interface as read_csv, with additional parameterstidyr (part of the tidyverse)
gather summarizes column names as a new columnspread spreads variable levels as new column namesspread and gatherspread and gathergather: exampletable4a
## # A tibble: 3 x 3 ## country `1999` `2000` ## * <chr> <int> <int> ## 1 Afghanistan 745 2666 ## 2 Brazil 37737 80488 ## 3 China 212258 213766
Fill in the corresponding lines in the code to tidy the table
table4a %>%
gather(
# Specify the name of the key, i. e. the variable which has the
# different columns as values
key = ,
# What name should the column have in which the former cell values
# are notated
value = ,
# Specify the columns which are variable levels
...)
spread: exampletable2
## # A tibble: 12 x 4 ## country year type count ## <chr> <int> <chr> <int> ## 1 Afghanistan 1999 cases 745 ## 2 Afghanistan 1999 population 19987071 ## 3 Afghanistan 2000 cases 2666 ## 4 Afghanistan 2000 population 20595360 ## 5 Brazil 1999 cases 37737 ## 6 Brazil 1999 population 172006362 ## 7 Brazil 2000 cases 80488 ## 8 Brazil 2000 population 174504898 ## 9 China 1999 cases 212258 ## 10 China 1999 population 1272915272 ## 11 China 2000 cases 213766 ## 12 China 2000 population 1280428583
Fill in the corresponding lines in the code to tidy the table
table2 %>%
spread(
# the variable which gives the new column names
key = ,
# the variable which gives the new column cell values
value =
)
*_join functions in dplyr*_join functions in dplyrtable1 %>% *_join(table2, by = "column(s)")
band_instruments
## # A tibble: 3 x 2 ## name plays ## <chr> <chr> ## 1 John guitar ## 2 Paul bass ## 3 Keith guitar
band_members
## # A tibble: 3 x 2 ## name band ## <chr> <chr> ## 1 Mick Stones ## 2 John Beatles ## 3 Paul Beatles
inner_joinleft_joinleft_joinright_joinfull_joinband_members and the instruments which they are playing.table1semi_join keeps the observation which can be found in table2anti_join keeps all other observationssemi_joinanti_jointidyr functions)